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Abstract 

rn"; 

^ . The classical Hill estimator is the most popular estimator of the extreme value index of 
^ ' Pareto-type distributions in the case of complete data. Einmahl et al. (2008, Bernoulli 
J5 ■ 14, no. 1, 207-227) adjusted this estimator (amongst others) to the case where the data 

^ ■ a.e subject to .ajoo, cen.o.Mp. They e.tab,.hed asy Jptotic ,».n,aUty unde. th.ee 

restrictive conditions, which produce an additional bias to the usual one. Making use of 
f-H ' the empirical processes theory, we relax these conditions to only one and represent the 
• I adapted estimator in terms of Brownian bridges without the aforementioned bias. 

g ■ Keywords: Brownian bridges; Extreme value index; Hill estimator; Random censoring; 

I— 'I Regularly varying distributions. 
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^ . 1. Introduction 

^ ■ Let Xi, X2, X„ be n independent copies (n > 1) of a non-negative random variable (rv) 

^ I X, defined over some probability space (f2,=e/, P) , with cumulative distribution function 

^ . (cdf) F. We assume that the distribution tail F := 1 — F is regularly varying at infinity, 

^ I with index (— 1/71) , notation: F G 7^V(_i/-y-^). That is 

b ■ — 

^ • F itx) 

lim ^= = X '"^^ , for any a; > 0. (1.1) 

F{t) ^ ' 

where 71 > is called shape parameter or tail index or extreme value index (EVI). It 
plays a very crucial role in the analysis of extremes as it governs the thickness of the 
distribution tails: the heavier the tails, the larger 71. Its estimation has got a great deal 
of interest for complete samples, as one might see in the textbook of Beirlant et al. (2004). 
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In this paper, we focus on the most celebrated estimator of 71, that was proposed by Hill 



where Xi „ < ... < Xn^n are the order statistics pertaining to the sample (Xi, ...,X„) and 
k = kn is an integer sequence satisfying 



The consistency of 7^^ was proved by Mason (1982) by only assuming the regular variation 
condition (1.1) and its asymptotic normality was established under a suitable extra as- 
sumption, known as the second-order regular variation condition (see de Haan and Stadtmiiller; 
1996 and de Haan and Ferreira; 2006, page 117). 

In the analysis of lifetime, reliability or insurance data, the observations are usually ran- 
domly censored. In other words, in many real situations the variable of interest X is not 
always available. An appropriate way to model this matter, is to introduce a non-negative 
rv Y, called censoring rv, independent of X and then to consider the tv Z := min (X, Y) 
and the indicator variable 6 := 1 {X < Y) , which determines whether or not X has 
been observed. The cdf's of Y and Z will be denoted by G and H respectively. The 
analysis of extreme values of randomly censored data is a new research topic to which 
Reiss and Thomas (1997, Section 6.1) made a very brief reference, as a first step but with 
no asymptotic results. Considering the Hall model (Hall, 1982), Beirlant et al. (2007) pro- 
posed estimators for the EVI and high quantiles and discussed their asymptotic properties, 
when the data are censored by a deterministic threshold. More recently, Einmahl et al. 
(2008) adapted various EVI estimators to the case where data are censored by a random 
threshold and proposed a unified method to establish their asymptotic normality. The 
obtained estimators are then used in the estimation of extreme quantiles under random 
censorship. Gomes and Neves (2011) also made a contribution to this field by providing a 
detailed simulation study and applying the estimation procedures on some survival data 
sets. 

Let us now define the adapted Hill estimator, of the tail index 71, under random cen- 
soring, introduced by Einmahl et al. (2008). The censoring distribution is assumed to be 
regularly varying too, that is G G 7lV(^-i/^^), for some 72 > 0. By virtue of the inde- 
pendence of X and Y, we have H (x) = F (x) G (x) and therefore H G 7^V(_i/^), with 
7 := 7172/ (71 + 72) • Let {{Zi, 6i) , 1 < i < n} he a sample from the couple of rv's {Z, 6) 



(1975): 




i=l 



1 < k < n, /c— T-oo and k/n ^ as n ^ 00. 



(1.2) 



and Zi^n < ••• < Zn,n represent the order statistics pertaining to {Zi, Z„) . If we denote 
the concomitant of the ith order statistic by (i.e. 6[i;n] = Sj if Zi^n = Zj), then the 
adapted Hill estimator of the tail index 71 is defined by 

7i •= — , 1-3 

p 

where 

k 

^-i log - log (1.4) 

and 



P 



i=l 



with k = kn satisfying (1.2) . Roughly speaking, the adapted Hill estimator is equal to 
the quotient of the classical Hill estimator to the proportion of non censored data. The 
asymptotic normality of 7^^''^^ requires the well-known second-order regular variation, 
that we express in terms of the tail quantile function Uh (x) := H'^ (1 — x~^) , x > 1, 
of cdf H, where (s) := inf {x : H (x) > s} , < s < 1 denotes the quantile function 
pertaining to H. There exist a constant r < and a function A G TZVt, not changing sign 
near infinity, such that 

lim — = x' , tor any X > 0. 1.6 

t^oo A{t) T ^ ^ ^ 

Our work is mainly motivated by the asymptotic normality result established by Einmahl et al. 
(2008), for which the following notations were needed. 

pM:^_ ^W^W ..>0. (1.7) 

G(z)f(z) + F(z)g(zy 

and 

lim p {z) = 72/ (71 + 72) =-pe (0, 1) , (1.8) 

2—^00 

where / and g denote the respective densities of cdf's F and G (if they exist). The three 
assumptions (that we call extract conditions on cdf H) listed below are required as well. 

• CI : VkA {n/k) ai < 00. 



1 ^■ 

C2:-=E 

yk i=i 



P\H-'{1-1]]-P 



— !■ ^2 < 00. 



C3:v^ sup |p(/J-i(s))-p(if-i(t))| ^0, foralli/>0. 

|l-A,7n<t<l,|s-t|<i/v^/n,s<l| 



4 

Notice that in Einmahl et al. (2008), the first assumption is given in a more general form 
which reduces in our case to CI, expressed in terms of the function A {t) of (1.6) . Theorem 
1 in Einmahl et al. (2008) says that, under assumptions above, we have 

Vk (7f - 7i) ^ ^[[^ - 7i«2) /P, , as n oo. (1.9) 

Hypothesis CI is a standard condition in the extreme value theory (EVT), necessary to 
derive the asymptotic normality of the classical Hill estimator, while C2 and C3 are not 
as usual. Indeed, we are accustomed in the EVT context to have a centered Gaussian 
limiting distribution, under CI with ai = (see Theorem 3.2.5 and Remark 3.2.6 in 
de Haan and Ferreira; 2006, page 74). This obviously is not the case in (1.9) , as assump- 
tion C2 produces an asymptotic normality of bias equals to — 7ia2/p. Assumption C3 is 
not very realistic and hard to verify. 

Let us briefly explain why conditions C2 and C3 were introduced. It is well-known that, 
under the usual condition CI, Hill's estimator 7^ of the tail index 7 of cdf H is asymptot- 
ically Gaussian with mean ai/ (1 — r) and variance 7^ (see de Haan and Peng; 1998). By 
definition, the observed rv's Z and 6 are dependent, which implies the dependence of the 
order statistics and the corresponding concomitants {'^[j:n] „ and therefore 

the related statistics 7^ and p are dependent. To establish the asymptotic normality of 
7!^'^^^ the dependence structure between 7^ and p as well as the limiting distribution of 
the latter are required. In order to overcome the difficulties that lie under this dependence 
structure, Einmahl et al. (2008) proceeded as follows. They introduced a statistic p, inde- 
pendent of 'j^ , such that (j> — p) ^ Af {0,p (1 — p)) along with another statistic, that 
we denote by p*, having the same distribution as the initial p. Then they derived their 
asymptotic normality result (1.9) from the following decomposition 

By construction, the first and second terms of the right-hand side are independent and 
asymptotically Gaussian whereas the third one converges in probability to —^xa^jp., pro- 
vided that both C2 and C3 hold. In summary, the first term of the asymptotic bias 
of \fk ^7!^'^'' — 7i^ , given in (1.9) , was produced by assumption CI while the second 
resulted from hypotheses C2 and C3. The limiting distribution (1.9) was not obtained di- 
rectly from 7!^'^^ but out of an approximation of 7^'^''^'', meaning that, in their approach, 
Einmahl et al. (2008) skirted the dependence issue between 7^ and p, which would have 
influenced the values of the asymptotic bias and variance in (1.9) . 



In this paper, we adopt an alternative approach only by using the second-order condition 
of regular variation (1.6) and CI without any additional assumptions. Our method is 
based on the uniform empirical processes (see Shorack and Wellner; 1986) and the related 
weak approximations given by Csorgo et al. (1986). More precisely, for a given sequence 
of independent identically distributed (iid) (0, l)-uniform rv's, we represent both 7^ and 
p by the same uniform empirical process and then we approximate them by a sequence 
of Brownian bridges. There is no surprise in the fact that the parameters of the limiting 
distributions in (1.9) and (2.10) are distinct. Indeed, Einmahl et al. (2008) based their 
arguments on the new statistic p, instead of the initial which produced additional 
conditions on the sample fraction k. As a consequence, they ended up with asymptotic 
biases and variances that are different from ours. On the other hand, Einmahl et al. 
(2008) used the function p whose definition (1.7) required the absolute continuity of cdf's 
F and G. This additional condition is not imposed in our approach as we introduce the 
function 



where H {z) := P (Z > z, 5 = 1) , converging to p at infinity as well (see Lemma 4.1). 
The rest of the paper is organized as follows. In Section 2, we state our main result which 
we prove in Section 3. Some results that are instrumental to our needs are gathered in 
two lemmas at the Appendix. 



Our main result, which consists in the asymptotic representation, in terms of Brownian 
bridges, of the EVI estimator in the presence of random censoring, is stated in the following 
theorem. 

Theorem 2.1. Assume that the second-order condition (1.6) holds. Let k := fc„ be an 
integer sequence satisfying (1.2) and CI. Then there exists a sequence of Brownian bridges 
Bn (s) such that, as n ^ 00, 




2. Main result 



Vk (7;^"=^ 



-71 



) 




Bn (pk/n) - (71 + ^ ) ./|5„ (1 - k/n) 
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Consequently 

(7i^'^) - 7i) ^ AT f — ^ + 27?) , a5 n ^ 00. (2.10) 



3. Proof 

It is clear that we have the following decomposition 

ilf'"^ - 7i) = iv^ (7"" - 7) - (P-P)- (3.11) 

We start our proof by focusing on y/k {p~p), then we will treat y/k (7^ — 7) . For z > 0, 



we have 



h' (z) 



G (y) dF (y) = (oo) - (z) , 

where (u) := ¥ {Z < u,6 = 1) ,u > 0. The empirical counterpart of function per- 
taining to the sample {{Zi, 61) , {Zn, 5„)} is 

hI (z) := - V 1 (Z, > z) 5, = - V 1 (Z,,„ > z) 

By taking z = Zn-k,n, we get 

1 1 " 1 

{^n-k,n) = ~ 1 (-^j.n > Zn-k,n) ^[i-.n] = ~ ^[n-i+l:r 

It follows that 



n ^ — ' n 

i=l i=l 



k 



n — ' n 

8=1 1=1 



p = -i/„(Z„_,,„). (3.12) 

Csorgo et al. (1986) have constructed a probability space {Q,A,f) carrying an infinite 
sequence ^i,^2,--- of independent rv's uniformly distributed on (0,1) and a sequence 
of Brownian bridges {i?„ (s) , < s < 1}, such that for the uniform empirical process 
a„ (t) = (U„ (t) - t) , < t < 1, where U„ (t) = n'^if {i I < i < n, < t} ,we have 

sup '^"/'^ = O, {n-^) , as n ^ oo, (3.13) 

l/n<t<l-l/n [t{l—t))' 

where < < 1/4. It is clear that 

hI {z) = U„ (z)) ,0<H'{z)< (oo) . 
Then, without loss of generahty, equation (3.12) may be rewritten into 



Observe now that 

^/k {p- p) = |^U„ (h' (Z„_,,„)) - pi' (Z„_fc,„)} + ^{Ih' - p} . 

(3.14) 

From now on, we will use the notation vi^-* ^ vi^-* to say that vi^-* = (1 + Op (1)) Yn\ 
as n — 7- oo. Since i7 G 7^V(_i/-y) then Z„,-k,n ^ C)0, it follows from Lemma 4.1 that 

-f^^ (^n-fc,n) ~ PH {Zn-k,n) aS U OO. 

Since Zn-k,n = H'^ {^n-k,n) , then H {Zn-k,n) ~ pH {H^^ {^n-k,n)) which is equal to 
p{l- in-k,n) in distribution. Therefore, -H {Zn-k,n) - P ^ i — ^n-k,n j , i-e. 

-H {Zn-k,n) - P^P^ (U„ {^n-k,n) - ^n-k,n) ■ (3.15) 

In view of representations (3.14) and (3.15) , we write 

^/k {p- p) = \ j^Oin (^n-fc,n)) + (1 + Op {!)) p^j^On {^n-k) ■ (3.16) 

Since h' {Zn-k,n) ~ J5(l - ^„_fc,„) and 1 - ^„-fc,n ^ k/n (see, e.g.. Lemma 2.2.3 in 
de Haan and Ferreira; 2006), then h' {Zn-k,n) ~ pk/n. The conditions (1.2) guaran- 
tee that P {Zn-k,n) e [n-\ 1 - n-^]^ 1 as n — 7- oo, because for all large n, pk/n 
belongs to [n~^, 1 — n^^] . Applying (3.13) , we get as n — )• oo. 



Oin {h {Zn-k,n)^ — [h (Z, 



n—k,n I 



iklnfl^^' 

Likewise, we have 



Op (n-^) 



— = Uy\n '^j , as n — oo. 



{kjn 



From Lemma 4.2, we have as n — )■ oo. 



Bn {h' {Zn-k,n)) ~ 5„ {pk/n) and {in-k) ~ 5„ (1 - k/n) . 



It follows that 



and 



y|-a„ i^H^ {Zn-k,n)^ = Bn {j)k / Ti) + Op[k ^) , as n oo, 

/ Tl 

J -^Cin Un-k) = fin (1 - k/u) + Op [k'^) , aS 72 OO. 



Combining the results above with equation (3.16) and using the fact that k ^ — )■ 0, we 
get as n — )■ oo, 

(P - P) = {Pk/n) + P^l^n (1 - k/n) + Op (1) . (3.17) 

It is readily checked that the variance of \friJkBn {pk/n) + p^ynJkBn (1 — k/n) tends to 



+ p as n — )■ oo, meaning that \/k {p — p) — > A/" (0, p^ + p) . 



Let us now consider the term y/k (7^ — 7) of the decomposition (3.11) , where 7^ is Hill's 
estimator of the tail index 7 of cdf H. Since {Zi}^^-^ = {H^^ (6)}"=i ? it follows that 

k 
i=l 

Making use of Theorems 2.3 and 2.4 in Csorgo et al. (1985) (see also de Haan and Ferreira; 
2006, pages 162-163), we conclude that 

- -,) . -.,flB„ ,1 - ,M . if^ll^^'^is . ^. (3.18, 

In virtue of their asymptotic normality, the quantities \/k (7^ — 7) and y/k (p — p) are 
bounded in probability. Therefore, we infer from the decomposition (3.11) that as n — t- 00, 

^ (^(^'-) _ ^ A ~ iv/^ f^H _ ^) _ 11^ (p-p). (3.19) 
V / p p 

Finally, using (3.17) and(3.18) in (3.19) yields as n — )• 00, 

(t!""'^^ - 71) - ipk/n) - (71 + 5. (1 - k/n) 



By an elementary calculation we obtain 

p \ p J p \ p J p 

as the asymptotic variance of y/k ^7['^''^'' — 7^ j . Replacing p by 72/ (71 + 72) completes 
the proof. 

Concluding notes 

The present work consists in providing a Gaussian limiting distribution for the esti- 
mator of the shape parameter of a heavy-tailed distribution, under random censoring. 



Our approach, based on the approximation of the uniform empirical process by a se- 
quence of Brownian bridges, enables us get rid of some restrictive assumptions made by 
Einmahl et al. (2008) on the underlying distribution. This makes the selection of the 
optimal sample fraction k easier, by only using the classical techniques based on the 
second-order condition of regular variation for complete data (see de Haan and Ferreira; 
2006, pages 77-78). Moreover, the representation of the estimator of the tail index in 
terms of Brownian bridges will be of great use in the statistical inference on quantities 
related to extreme values in the context of censored data, namely high quantiles, risk 
measures... The extension of our methodology to any real EVI will be the object of a 
future work. 

4. Appendix 

Lemma 4.1. Let F and G be cdf's with F G 7^V(_i/^i) ^ ^ '^"^'(-1/72)5 Ti^Tz > 
0. Then 

r h\z) 
lim ^= = p. 

Proof. Recall that H {z) = G (x) dF (x) . By using the change of variables F [x) = 
1/t, we write {z) = jy-pf^.^-^ t~'^G {F^^ (1 — 1/t)) dt. It is easy to verify that the function 
t^R{t) := t-^G{F-^ (1 - 1/t)) belongs to 7^V(_(l+^,/^2)). Making use of Theorem 1.2.2 
in de Haan and Ferreira (2006, assertion 1.2.6, page 20), we infer that for all large z 

h\z)= _ i? (t)-~(l + 71/72)"' i?(l/F(^)). 

Jl/F{z) ^ 

We have 1/ (1 + 71/72) = p and [l/F {z)) = F {z)G (z) . Hence # (z) ~ pH (z) , as 
z — 00, as sought. □ 

Lemma 4.2. Let k = kn be an integer sequence satisfying (1.2) and {i?„ (s) , < s < 1} 
be a sequence of Brownian bridges, defined in the probability space (fi,^, P). Then under 
the assumptions of Lemma 4-1, for all large n we have 



Bn H {Zn-k,n) ) B ( ^ i, ) 

Proof. We only give the proof of assertion (i) , the second follows by similar arguments. We 
will use analogue techniques to those used in the proof of Lemma 7.2.1 in Csorgo and Revesz 
(1981, page 258). To this end, we set a„ := pk/n and e„ := {Zn-k,n) /pk/n — 1. Define a 



10 



sequence of Wiener processes {Wn (s) , < s < 1} so that Bn (s) = Wn (s) — sWn (1) . Let 
< e < 1, then there exist 6 > and a integer N > n such that for all n > N, 



P{\e„\<6)>l-e/3. 

Observe that 



This may be rewritten into 

Wn (a„ + e„an) - Wn (an) Wn (an + ^nan 



!)• 



In the set {|e„| < 5} , we have 

Wn {Zn-k,n)) Wn (On) 



H {Zn-k,n) 



< 



\Wn (an + enOn) - Wn (a^) 



+ 



\Wn{an + 



It is easy to verify that 



Wn {Zn-k,n)) Wn (a„) 



H {Zn-k,n) 



< and 



< sup sup 

0<a;<a„ 0<s<5a„ 



\Wn{x + s)-Wn{x)\ 



+ 



1-5 



sup 

0<x<(l+5)a„ 



\Wn{x)\ 



(4.20) 



Let ?7 > 0, then for all large n, we have 



P 



WniH (Zn-k, 



Wn{an) 



H {Zn-k,n) 



>-,)<./3 + p(/„>|)+p(J£^J„>|). 



Since J„ is bounded in probability, then from (4.20) we infer that P (^jz^Jn > f) ^ 
e/3. On the other hand, Lemma 1.2.1 in Csorgo and Revesz (1981, page 29) yields 
P (/„ > f ) < e/3. It follows that 
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This implies that 

Wn iZn-k,n)) = {I + Op (1)) \ ^ ^^""^'"V ^ (a^ . 

From Lemma 4.1 we have {Zn-k,n) = (1 + (1)) pH {Zn-k,n) ■ Since H {Zn-k,n) = 1 — 

in-k,n, then {Zn^k,n) /O-a 1, it folloWS that Wn {Zn-k,n)) = (1 + Op (1)) Wn (a„) . 

Thus Bn [h^ (^n-fc,ri) j = (1 + Op (1)) i?„ (a„) , which achieves the proof. □ 
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